Overview

Dataset statistics

Number of variables12
Number of observations150000
Missing cells33655
Missing cells (%)1.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory13.7 MiB
Average record size in memory96.0 B

Variable types

Numeric11
Categorical1

Warnings

NumberOfTime30-59DaysPastDueNotWorse is highly correlated with NumberOfTimes90DaysLate and 1 other fieldsHigh correlation
NumberOfTimes90DaysLate is highly correlated with NumberOfTime30-59DaysPastDueNotWorse and 1 other fieldsHigh correlation
NumberOfTime60-89DaysPastDueNotWorse is highly correlated with NumberOfTime30-59DaysPastDueNotWorse and 1 other fieldsHigh correlation
MonthlyIncome has 29731 (19.8%) missing values Missing
NumberOfDependents has 3924 (2.6%) missing values Missing
RevolvingUtilizationOfUnsecuredLines is highly skewed (γ1 = 97.63157449) Skewed
NumberOfTime30-59DaysPastDueNotWorse is highly skewed (γ1 = 22.59710756) Skewed
DebtRatio is highly skewed (γ1 = 95.15779287) Skewed
MonthlyIncome is highly skewed (γ1 = 114.0403179) Skewed
NumberOfTimes90DaysLate is highly skewed (γ1 = 23.08734547) Skewed
NumberOfTime60-89DaysPastDueNotWorse is highly skewed (γ1 = 23.33174312) Skewed
df_index is uniformly distributed Uniform
df_index has unique values Unique
RevolvingUtilizationOfUnsecuredLines has 10878 (7.3%) zeros Zeros
NumberOfTime30-59DaysPastDueNotWorse has 126018 (84.0%) zeros Zeros
DebtRatio has 4113 (2.7%) zeros Zeros
MonthlyIncome has 1634 (1.1%) zeros Zeros
NumberOfOpenCreditLinesAndLoans has 1888 (1.3%) zeros Zeros
NumberOfTimes90DaysLate has 141662 (94.4%) zeros Zeros
NumberRealEstateLoansOrLines has 56188 (37.5%) zeros Zeros
NumberOfTime60-89DaysPastDueNotWorse has 142396 (94.9%) zeros Zeros
NumberOfDependents has 86902 (57.9%) zeros Zeros

Reproduction

Analysis started2021-04-24 11:37:11.624048
Analysis finished2021-04-24 11:37:45.956069
Duration34.33 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIFORM
UNIQUE

Distinct150000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean75000.5
Minimum1
Maximum150000
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB
2021-04-24T19:37:46.212282image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile7500.95
Q137500.75
median75000.5
Q3112500.25
95-th percentile142500.05
Maximum150000
Range149999
Interquartile range (IQR)74999.5

Descriptive statistics

Standard deviation43301.41453
Coefficient of variation (CV)0.5773483447
Kurtosis-1.2
Mean75000.5
Median Absolute Deviation (MAD)37500
Skewness0
Sum1.1250075 × 1010
Variance1875012500
MonotocityStrictly increasing
2021-04-24T19:37:46.407855image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20471
 
< 0.1%
1078061
 
< 0.1%
95181
 
< 0.1%
156611
 
< 0.1%
136121
 
< 0.1%
33711
 
< 0.1%
13221
 
< 0.1%
74651
 
< 0.1%
54161
 
< 0.1%
279431
 
< 0.1%
Other values (149990)149990
> 99.9%
ValueCountFrequency (%)
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
51
< 0.1%
ValueCountFrequency (%)
1500001
< 0.1%
1499991
< 0.1%
1499981
< 0.1%
1499971
< 0.1%
1499961
< 0.1%

SeriousDlqin2yrs
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
139974 
1
 
10026

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters150000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0139974
93.3%
110026
 
6.7%
2021-04-24T19:37:46.742568image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
2021-04-24T19:37:46.846454image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
0139974
93.3%
110026
 
6.7%

Most occurring characters

ValueCountFrequency (%)
0139974
93.3%
110026
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number150000
100.0%

Most frequent character per category

ValueCountFrequency (%)
0139974
93.3%
110026
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Common150000
100.0%

Most frequent character per script

ValueCountFrequency (%)
0139974
93.3%
110026
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII150000
100.0%

Most frequent character per block

ValueCountFrequency (%)
0139974
93.3%
110026
 
6.7%

RevolvingUtilizationOfUnsecuredLines
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct125728
Distinct (%)83.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.048438055
Minimum0
Maximum50708
Zeros10878
Zeros (%)7.3%
Memory size1.1 MiB
2021-04-24T19:37:47.026859image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.029867442
median0.154180737
Q30.5590462475
95-th percentile0.9999999
Maximum50708
Range50708
Interquartile range (IQR)0.5291788055

Descriptive statistics

Standard deviation249.7553706
Coefficient of variation (CV)41.29254005
Kurtosis14544.71341
Mean6.048438055
Median Absolute Deviation (MAD)0.148325347
Skewness97.63157449
Sum907265.7082
Variance62377.74516
MonotocityNot monotonic
2021-04-24T19:37:47.229948image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
010878
 
7.3%
0.999999910256
 
6.8%
117
 
< 0.1%
0.95009988
 
< 0.1%
0.713147416
 
< 0.1%
0.0079840326
 
< 0.1%
0.9540918166
 
< 0.1%
0.7964071865
 
< 0.1%
0.8502994015
 
< 0.1%
0.5389221565
 
< 0.1%
Other values (125718)128808
85.9%
ValueCountFrequency (%)
010878
7.3%
8.37 × 1061
 
< 0.1%
9.93 × 1061
 
< 0.1%
1.25 × 1051
 
< 0.1%
1.43 × 1051
 
< 0.1%
ValueCountFrequency (%)
507081
< 0.1%
291101
< 0.1%
221981
< 0.1%
220001
< 0.1%
205141
< 0.1%

age
Real number (ℝ≥0)

Distinct86
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean52.29520667
Minimum0
Maximum109
Zeros1
Zeros (%)< 0.1%
Memory size1.1 MiB
2021-04-24T19:37:47.428457image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile29
Q141
median52
Q363
95-th percentile78
Maximum109
Range109
Interquartile range (IQR)22

Descriptive statistics

Standard deviation14.77186586
Coefficient of variation (CV)0.2824707426
Kurtosis-0.4946688326
Mean52.29520667
Median Absolute Deviation (MAD)11
Skewness0.1889945451
Sum7844281
Variance218.2080211
MonotocityNot monotonic
2021-04-24T19:37:47.619049image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
493837
 
2.6%
483806
 
2.5%
503753
 
2.5%
633719
 
2.5%
473719
 
2.5%
463714
 
2.5%
533648
 
2.4%
513627
 
2.4%
523609
 
2.4%
563589
 
2.4%
Other values (76)112979
75.3%
ValueCountFrequency (%)
01
 
< 0.1%
21183
 
0.1%
22434
0.3%
23641
0.4%
24816
0.5%
ValueCountFrequency (%)
1092
< 0.1%
1071
 
< 0.1%
1051
 
< 0.1%
1033
< 0.1%
1023
< 0.1%

NumberOfTime30-59DaysPastDueNotWorse
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4210333333
Minimum0
Maximum98
Zeros126018
Zeros (%)84.0%
Memory size1.1 MiB
2021-04-24T19:37:47.785585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.192781272
Coefficient of variation (CV)9.958311944
Kurtosis522.3765449
Mean0.4210333333
Median Absolute Deviation (MAD)0
Skewness22.59710756
Sum63155
Variance17.57941479
MonotocityNot monotonic
2021-04-24T19:37:47.935387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
0126018
84.0%
116033
 
10.7%
24598
 
3.1%
31754
 
1.2%
4747
 
0.5%
5342
 
0.2%
98264
 
0.2%
6140
 
0.1%
754
 
< 0.1%
825
 
< 0.1%
Other values (6)25
 
< 0.1%
ValueCountFrequency (%)
0126018
84.0%
116033
 
10.7%
24598
 
3.1%
31754
 
1.2%
4747
 
0.5%
ValueCountFrequency (%)
98264
0.2%
965
 
< 0.1%
131
 
< 0.1%
122
 
< 0.1%
111
 
< 0.1%

DebtRatio
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct114194
Distinct (%)76.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean353.0050758
Minimum0
Maximum329664
Zeros4113
Zeros (%)2.7%
Memory size1.1 MiB
2021-04-24T19:37:48.158762image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.004329004
Q10.1750738323
median0.366507841
Q30.8682537732
95-th percentile2449
Maximum329664
Range329664
Interquartile range (IQR)0.693179941

Descriptive statistics

Standard deviation2037.818523
Coefficient of variation (CV)5.772774
Kurtosis13734.28886
Mean353.0050758
Median Absolute Deviation (MAD)0.2457227975
Skewness95.15779287
Sum52950761.36
Variance4152704.333
MonotocityNot monotonic
2021-04-24T19:37:48.344424image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
04113
 
2.7%
1229
 
0.2%
4174
 
0.1%
2170
 
0.1%
3162
 
0.1%
5143
 
0.1%
9125
 
0.1%
10117
 
0.1%
7115
 
0.1%
13114
 
0.1%
Other values (114184)144538
96.4%
ValueCountFrequency (%)
04113
2.7%
2.6 × 1051
 
< 0.1%
3.69 × 1051
 
< 0.1%
3.93 × 1051
 
< 0.1%
6.62 × 1051
 
< 0.1%
ValueCountFrequency (%)
3296641
< 0.1%
3264421
< 0.1%
3070011
< 0.1%
2205161
< 0.1%
1688351
< 0.1%

MonthlyIncome
Real number (ℝ≥0)

MISSING
SKEWED
ZEROS

Distinct13594
Distinct (%)11.3%
Missing29731
Missing (%)19.8%
Infinite0
Infinite (%)0.0%
Mean6670.221237
Minimum0
Maximum3008750
Zeros1634
Zeros (%)1.1%
Memory size1.1 MiB
2021-04-24T19:37:48.550869image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1300
Q13400
median5400
Q38249
95-th percentile14587.6
Maximum3008750
Range3008750
Interquartile range (IQR)4849

Descriptive statistics

Standard deviation14384.67422
Coefficient of variation (CV)2.15655129
Kurtosis19504.7054
Mean6670.221237
Median Absolute Deviation (MAD)2317
Skewness114.0403179
Sum802220838
Variance206918852.3
MonotocityNot monotonic
2021-04-24T19:37:48.739015image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50002757
 
1.8%
40002106
 
1.4%
60001934
 
1.3%
30001758
 
1.2%
01634
 
1.1%
25001551
 
1.0%
100001466
 
1.0%
35001360
 
0.9%
45001226
 
0.8%
70001223
 
0.8%
Other values (13584)103254
68.8%
(Missing)29731
 
19.8%
ValueCountFrequency (%)
01634
1.1%
1605
 
0.4%
26
 
< 0.1%
42
 
< 0.1%
52
 
< 0.1%
ValueCountFrequency (%)
30087501
< 0.1%
17940601
< 0.1%
15601001
< 0.1%
10725001
< 0.1%
8350401
< 0.1%

NumberOfOpenCreditLinesAndLoans
Real number (ℝ≥0)

ZEROS

Distinct58
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.45276
Minimum0
Maximum58
Zeros1888
Zeros (%)1.3%
Memory size1.1 MiB
2021-04-24T19:37:49.158017image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q15
median8
Q311
95-th percentile18
Maximum58
Range58
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.14595099
Coefficient of variation (CV)0.6087894356
Kurtosis3.091066746
Mean8.45276
Median Absolute Deviation (MAD)3
Skewness1.21531378
Sum1267914
Variance26.48081159
MonotocityNot monotonic
2021-04-24T19:37:49.339789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
613614
 
9.1%
713245
 
8.8%
512931
 
8.6%
812562
 
8.4%
411609
 
7.7%
911355
 
7.6%
109624
 
6.4%
39058
 
6.0%
118321
 
5.5%
127005
 
4.7%
Other values (48)40676
27.1%
ValueCountFrequency (%)
01888
 
1.3%
14438
 
3.0%
26666
4.4%
39058
6.0%
411609
7.7%
ValueCountFrequency (%)
581
 
< 0.1%
572
< 0.1%
562
< 0.1%
544
< 0.1%
531
 
< 0.1%

NumberOfTimes90DaysLate
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct19
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2659733333
Minimum0
Maximum98
Zeros141662
Zeros (%)94.4%
Memory size1.1 MiB
2021-04-24T19:37:49.507580image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.169303788
Coefficient of variation (CV)15.67564588
Kurtosis537.7389446
Mean0.2659733333
Median Absolute Deviation (MAD)0
Skewness23.08734547
Sum39896
Variance17.38309407
MonotocityNot monotonic
2021-04-24T19:37:49.655833image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
0141662
94.4%
15243
 
3.5%
21555
 
1.0%
3667
 
0.4%
4291
 
0.2%
98264
 
0.2%
5131
 
0.1%
680
 
0.1%
738
 
< 0.1%
821
 
< 0.1%
Other values (9)48
 
< 0.1%
ValueCountFrequency (%)
0141662
94.4%
15243
 
3.5%
21555
 
1.0%
3667
 
0.4%
4291
 
0.2%
ValueCountFrequency (%)
98264
0.2%
965
 
< 0.1%
171
 
< 0.1%
152
 
< 0.1%
142
 
< 0.1%

NumberRealEstateLoansOrLines
Real number (ℝ≥0)

ZEROS

Distinct28
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.01824
Minimum0
Maximum54
Zeros56188
Zeros (%)37.5%
Memory size1.1 MiB
2021-04-24T19:37:49.810323image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32
95-th percentile3
Maximum54
Range54
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.129770985
Coefficient of variation (CV)1.109533101
Kurtosis60.47680765
Mean1.01824
Median Absolute Deviation (MAD)1
Skewness3.482483994
Sum152736
Variance1.276382478
MonotocityNot monotonic
2021-04-24T19:37:49.968992image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
056188
37.5%
152338
34.9%
231522
21.0%
36300
 
4.2%
42170
 
1.4%
5689
 
0.5%
6320
 
0.2%
7171
 
0.1%
893
 
0.1%
978
 
0.1%
Other values (18)131
 
0.1%
ValueCountFrequency (%)
056188
37.5%
152338
34.9%
231522
21.0%
36300
 
4.2%
42170
 
1.4%
ValueCountFrequency (%)
541
 
< 0.1%
321
 
< 0.1%
291
 
< 0.1%
261
 
< 0.1%
253
< 0.1%

NumberOfTime60-89DaysPastDueNotWorse
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct13
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2403866667
Minimum0
Maximum98
Zeros142396
Zeros (%)94.9%
Memory size1.1 MiB
2021-04-24T19:37:50.116839image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum98
Range98
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.155179421
Coefficient of variation (CV)17.28539889
Kurtosis545.6827435
Mean0.2403866667
Median Absolute Deviation (MAD)0
Skewness23.33174312
Sum36058
Variance17.26551602
MonotocityNot monotonic
2021-04-24T19:37:50.258043image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
0142396
94.9%
15731
 
3.8%
21118
 
0.7%
3318
 
0.2%
98264
 
0.2%
4105
 
0.1%
534
 
< 0.1%
616
 
< 0.1%
79
 
< 0.1%
965
 
< 0.1%
Other values (3)4
 
< 0.1%
ValueCountFrequency (%)
0142396
94.9%
15731
 
3.8%
21118
 
0.7%
3318
 
0.2%
4105
 
0.1%
ValueCountFrequency (%)
98264
0.2%
965
 
< 0.1%
111
 
< 0.1%
91
 
< 0.1%
82
 
< 0.1%

NumberOfDependents
Real number (ℝ≥0)

MISSING
ZEROS

Distinct13
Distinct (%)< 0.1%
Missing3924
Missing (%)2.6%
Infinite0
Infinite (%)0.0%
Mean0.7572222679
Minimum0
Maximum20
Zeros86902
Zeros (%)57.9%
Memory size1.1 MiB
2021-04-24T19:37:50.398948image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum20
Range20
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.115086071
Coefficient of variation (CV)1.472600739
Kurtosis3.001656811
Mean0.7572222679
Median Absolute Deviation (MAD)0
Skewness1.588242379
Sum110612
Variance1.243416947
MonotocityNot monotonic
2021-04-24T19:37:50.535092image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
086902
57.9%
126316
 
17.5%
219522
 
13.0%
39483
 
6.3%
42862
 
1.9%
5746
 
0.5%
6158
 
0.1%
751
 
< 0.1%
824
 
< 0.1%
95
 
< 0.1%
Other values (3)7
 
< 0.1%
(Missing)3924
 
2.6%
ValueCountFrequency (%)
086902
57.9%
126316
 
17.5%
219522
 
13.0%
39483
 
6.3%
42862
 
1.9%
ValueCountFrequency (%)
201
 
< 0.1%
131
 
< 0.1%
105
 
< 0.1%
95
 
< 0.1%
824
< 0.1%

Interactions

2021-04-24T19:37:20.787696image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:21.023899image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:21.235429image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:21.447047image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:21.652330image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:21.872573image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:22.089305image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:22.304056image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:22.514224image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:22.723413image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:22.949850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:23.224549image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:23.418914image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:23.620673image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:23.821679image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:24.031865image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:24.233135image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:24.549219image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:24.747877image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:24.956616image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:25.173357image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:25.373419image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:25.569591image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:25.762380image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:25.954001image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:26.157273image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:26.356971image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:26.556486image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:26.750148image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:26.948999image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:27.157850image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:27.362088image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:27.565399image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:27.758234image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:27.955223image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:28.162739image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:28.366435image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:28.564646image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:28.761909image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:28.963648image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:29.174479image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:29.385060image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:29.591595image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:29.785754image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:29.986361image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:30.194672image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:30.398784image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:30.597976image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:30.795951image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:31.004082image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:31.213138image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:31.545940image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:31.748562image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:31.945154image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:32.145268image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:32.395057image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:32.645936image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:32.843539image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:33.038694image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:33.248841image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:33.468165image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:33.680132image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:33.897721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:34.099824image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:34.303260image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:34.520579image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:34.728207image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:34.937497image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:35.155319image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:35.363994image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:35.586641image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:35.789642image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:35.995962image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:36.198222image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:36.404243image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:36.600819image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:36.808909image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:37.012728image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:37.213854image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:37.409217image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:37.619193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:37.829344image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:38.032466image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:38.235412image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:38.448568image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:38.640978image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:38.849461image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:39.052370image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:39.247465image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:39.447127image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:39.655469image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:39.863989image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:40.067013image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:40.424288image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:40.621724image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:40.817789image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:41.023523image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:41.228949image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:41.427316image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:41.627757image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:41.840042image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:42.060721image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:42.280958image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:42.490422image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:42.705136image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:42.918822image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:43.139886image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:43.355059image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:43.566468image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-04-24T19:37:43.778503image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-04-24T19:37:50.697975image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-24T19:37:51.017051image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-24T19:37:51.334384image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-24T19:37:51.653257image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-04-24T19:37:44.229820image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-24T19:37:44.747429image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-24T19:37:45.493394image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-04-24T19:37:45.669829image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

df_indexSeriousDlqin2yrsRevolvingUtilizationOfUnsecuredLinesageNumberOfTime30-59DaysPastDueNotWorseDebtRatioMonthlyIncomeNumberOfOpenCreditLinesAndLoansNumberOfTimes90DaysLateNumberRealEstateLoansOrLinesNumberOfTime60-89DaysPastDueNotWorseNumberOfDependents
0110.7661274520.8029829120.0130602.0
1200.9571514000.1218762600.040001.0
2300.6581803810.0851133042.021000.0
3400.2338103000.0360503300.050000.0
4500.9072394910.02492663588.070100.0
5600.2131797400.3756073500.030101.0
6700.3056825705710.000000NaN80300.0
7800.7544643900.2099403500.080000.0
8900.11695127046.000000NaN2000NaN
91000.1891695700.60629123684.090402.0

Last rows

df_indexSeriousDlqin2yrsRevolvingUtilizationOfUnsecuredLinesageNumberOfTime30-59DaysPastDueNotWorseDebtRatioMonthlyIncomeNumberOfOpenCreditLinesAndLoansNumberOfTimes90DaysLateNumberRealEstateLoansOrLinesNumberOfTime60-89DaysPastDueNotWorseNumberOfDependents
14999014999100.0555184600.6097794335.070102.0
14999114999200.1041125900.47765810316.0100200.0
14999214999300.8719765004132.000000NaN110103.0
14999314999401.0000002200.000000820.010000.0
14999414999500.3857425000.4042933400.070000.0
14999514999600.0406747400.2251312100.040100.0
14999614999700.2997454400.7165625584.040102.0
14999714999800.2460445803870.000000NaN180100.0
14999814999900.0000003000.0000005716.040000.0
14999915000000.8502836400.2499088158.080200.0